Skip to content

Conversation

@memodi
Copy link
Member

@memodi memodi commented Oct 14, 2025

Description

NETOBSERV-2443 fix bug, improve cleanup and writing files

With the help of Claude, I was able to identify the flakiness coming from pty and made bunch of improvements as below:

  • Complete output capture - All lines captured, no race conditions
  • Proper timeout handling - API calls respect polling context timeouts
  • Reliable cleanup - Ignores SIGHUP, completes deletion
  • Absolute paths for file reads
  • Cleanup of output/flow directory after every test, so next test won't read from the same file.
  • Use the OCP-XXXX and test label combination for the output files of collector and cleanup cmd

Made several runs, now CLI tests are much stable.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci
Copy link

openshift-ci bot commented Oct 14, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jotak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@memodi memodi added the no-qe label Oct 14, 2025
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 13.82%. Comparing base (1654142) to head (d8a6355).

Files with missing lines Patch % Lines
e2e/integration-tests/cli.go 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #404      +/-   ##
==========================================
- Coverage   13.84%   13.82%   -0.02%     
==========================================
  Files          18       18              
  Lines        2731     2734       +3     
==========================================
  Hits          378      378              
- Misses       2329     2332       +3     
  Partials       24       24              
Flag Coverage Δ
unittests 13.82% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
e2e/integration-tests/cli.go 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@memodi
Copy link
Member Author

memodi commented Oct 14, 2025

/test ?

@openshift-ci
Copy link

openshift-ci bot commented Oct 14, 2025

@memodi: The following commands are available to trigger required jobs:

/test images
/test integration-tests

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@memodi
Copy link
Member Author

memodi commented Oct 14, 2025

/test integration-tests

@memodi memodi requested a review from jpinsonneau October 16, 2025 21:00
@memodi
Copy link
Member Author

memodi commented Oct 17, 2025

/test integration-tests

@memodi
Copy link
Member Author

memodi commented Oct 17, 2025

integration tests are failing because for some reason CI cluster is taking too long to pull images.

/test integration-tests

@memodi
Copy link
Member Author

memodi commented Oct 21, 2025

/test integration-tests

1 similar comment
@memodi
Copy link
Member Author

memodi commented Oct 24, 2025

/test integration-tests

memodi and others added 4 commits October 24, 2025 11:45
- Increase waitDaemonset timeout from 50s to 5 minutes (30×10s)
  * CI environments often have slow image pulls
  * Previous timeout was too aggressive for registry operations

- Add comprehensive diagnostic output on pod startup failure:
  * Pod status with node placement (get pods -o wide)
  * Recent events to identify ImagePullBackOff, etc
  * Pod event details from describe output
  * Daemonset logs if containers started

This helps diagnose ContainerCreating issues in CI where pods
fail to start due to image pull problems or resource constraints.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
CI runs showed 4/6 pods ready with 5 minute timeout, indicating
image pulls need more time. Increasing to 10 minutes (60×10s) to
accommodate slower CI registry pulls and pod scheduling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
In E2E test mode, the bash script's waitDaemonset() could exit with
error after 10 minutes while the Go test's isDaemonsetReady() was
still polling. This created a race where:

1. Go test calls StartCommand() which runs bash script async
2. Bash script calls waitDaemonset() and waits 10 mins
3. Go test calls isDaemonsetReady() and waits 10 mins
4. If bash times out first, it calls exit 1, killing the process
5. Go test is left polling a dead command

Solution: When isE2E=true, skip the bash-level wait since the Go
test framework handles pod readiness checking via isDaemonsetReady().

For manual CLI usage (isE2E=false), the wait still runs as before
to provide user feedback.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Tests were failing because:
1. Commands ran with --max-time=1m in foreground mode
2. After 1 minute, capture finished and auto-cleanup ran
3. Cleanup deleted the daemonset
4. isDaemonsetReady() was polling for a deleted daemonset
5. Test failed with context deadline exceeded

Using --background mode prevents automatic cleanup when the
capture finishes, allowing the test to verify daemonset
privilege settings before cleanup runs.
Also, Check for CLI is running instead of just daemnset.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@openshift-ci
Copy link

openshift-ci bot commented Oct 24, 2025

@memodi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/integration-tests 553e707 link true /test integration-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant